Exploration

The objective of this research is to find trends in attendance and insights into the performance of home teams by examining the FIFA World Cup dataset. The collection, which is an archive of past games, has a wealth of information about team dynamics, scores, and dates. Following the organization and summary of the data, we move on to the visualizations: a bar plot that breaks out home team goal accomplishments and a line plot that charts the growth of attendance. These succinct analyses seek to expose the underlying themes and patterns that characterize the FIFA World Cup experience, capturing the spirit of international soccer fever and the distinctive tales embedded in every game.

Data Exploration

Owing to the size of the FIFA World Cup dataset, a first investigation is necessary to fully understand its complexity. A graphic depiction was created, emphasizing important elements of the matches, in order to obtain a preliminary comprehension. The graphic that is presented displays a bar plot that illustrates how different teams’ home team goals are distributed. A cursory look at the goal-scoring tendencies is made possible by the plot, which highlights exceptional players and possible trends. Remarkably, the scale highlights high-scoring home teams in red, while lower-scoring teams are shown in blue. This initial investigation provides a visual story of the goal dynamics inside the FIFA World Cup matches dataset, laying the groundwork for future comprehensive analysis.

# Read the FIFA World Cup matches dataset
fifa_matches <- read.csv("/Users/jeeveshrajgupta/Desktop/FP_Jeevesh/WorldCupMatches.csv")

# Display the structure of the dataset
str(fifa_matches)
## 'data.frame':    4572 obs. of  20 variables:
##  $ Year                : int  1930 1930 1930 1930 1930 1930 1930 1930 1930 1930 ...
##  $ Datetime            : chr  "13 Jul 1930 - 15:00 " "13 Jul 1930 - 15:00 " "14 Jul 1930 - 12:45 " "14 Jul 1930 - 14:50 " ...
##  $ Stage               : chr  "Group 1" "Group 4" "Group 2" "Group 3" ...
##  $ Stadium             : chr  "Pocitos" "Parque Central" "Parque Central" "Pocitos" ...
##  $ City                : chr  "Montevideo " "Montevideo " "Montevideo " "Montevideo " ...
##  $ Home.Team.Name      : chr  "France" "USA" "Yugoslavia" "Romania" ...
##  $ Home.Team.Goals     : int  4 3 2 3 1 3 4 3 1 1 ...
##  $ Away.Team.Goals     : int  1 0 1 1 0 0 0 0 0 0 ...
##  $ Away.Team.Name      : chr  "Mexico" "Belgium" "Brazil" "Peru" ...
##  $ Win.conditions      : chr  " " " " " " " " ...
##  $ Attendance          : int  4444 18346 24059 2549 23409 9249 18306 18306 57735 2000 ...
##  $ Half.time.Home.Goals: int  3 2 2 1 0 1 0 2 0 0 ...
##  $ Half.time.Away.Goals: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Referee             : chr  "LOMBARDI Domingo (URU)" "MACIAS Jose (ARG)" "TEJADA Anibal (URU)" "WARNKEN Alberto (CHI)" ...
##  $ Assistant.1         : chr  "CRISTOPHE Henry (BEL)" "MATEUCCI Francisco (URU)" "VALLARINO Ricardo (URU)" "LANGENUS Jean (BEL)" ...
##  $ Assistant.2         : chr  "REGO Gilberto (BRA)" "WARNKEN Alberto (CHI)" "BALWAY Thomas (FRA)" "MATEUCCI Francisco (URU)" ...
##  $ RoundID             : int  201 201 201 201 201 201 201 201 201 201 ...
##  $ MatchID             : int  1096 1090 1093 1098 1085 1095 1092 1097 1099 1094 ...
##  $ Home.Team.Initials  : chr  "FRA" "USA" "YUG" "ROU" ...
##  $ Away.Team.Initials  : chr  "MEX" "BEL" "BRA" "PER" ...
# Check the column names in the dataset
colnames_fifa <- colnames(fifa_matches)

# Define the columns to summarize
summary_columns <- c("Home Team Goals", "Away Team Goals", "Attendance")

# Check if the specified columns exist in the dataset
missing_columns <- setdiff(summary_columns, colnames_fifa)

# Check if there are any missing columns
if (length(missing_columns) > 0) {
  warning(paste("Warning: The following columns are missing in the dataset:", paste(missing_columns, collapse = ", ")))
}
## Warning: Warning: The following columns are missing in the dataset: Home Team
## Goals, Away Team Goals
# Filter out the columns that exist in the dataset
existing_columns <- intersect(summary_columns, colnames_fifa)

# Generate summary statistics for existing numerical columns
summary(fifa_matches[, existing_columns])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    2000   30000   41580   45165   61374  173850    3722
# Convert 'Datetime' column to a Date format
fifa_matches$Datetime <- as.Date(fifa_matches$Datetime, format = "%d %b %Y - %H:%M")
## Warning: Removed 3722 rows containing missing values (`position_stack()`).

library(plotly)

plot_ly(data = fifa_matches, x = ~Home.Team.Name, y = ~Home.Team.Goals, type = 'bar', color = ~Home.Team.Name) %>%
  layout(title = "Home Team Performances", xaxis = list(tickangle = 45, tickmode = "array", tickvals = seq(1, nrow(fifa_matches), 5), ticktext = fifa_matches$Home.Team.Name[seq(1, nrow(fifa_matches), 5)]))
## Warning: Ignoring 3720 observations
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
library(gt)

# Calculate total attendance for each year
total_attendance <- fifa_matches %>%
  group_by(Year) %>%
  summarise(Total_Attendance = sum(Attendance, na.rm = TRUE))

# Print the table
total_attendance %>%
  gt() %>%
  tab_header(
    title = "Total Attendance Summary",
    subtitle = NULL
  ) %>%
  fmt_number(
    columns = vars(Total_Attendance),
    decimals = 0
  )
## Warning: Since gt v0.3.0, `columns = vars(...)` has been deprecated.
## • Please use `columns = c(...)` instead.
## Warning: Since gt v0.3.0, `columns = vars(...)` has been deprecated.
## • Please use `columns = c(...)` instead.
Total Attendance Summary
Year Total_Attendance
1930 590,549
1934 363,000
1938 375,700
1950 1,045,246
1954 768,607
1958 819,810
1962 893,172
1966 1,563,135
1970 1,603,975
1974 1,865,753
1978 1,545,791
1982 2,109,723
1986 2,394,031
1990 2,516,215
1994 3,587,538
1998 2,785,100
2002 2,705,197
2006 3,359,439
2010 3,178,856
2014 4,319,243
NA 0

In-Depth Analysis

The FIFA World Cup attendance bar plot shows interesting patterns throughout time, with peaks in the early 1990s and consistent increases since the 1950s. The tournament’s tenacity and capacity to connect with a worldwide audience are suggested by this graphic tale, whose high points correspond with important turning points in soccer history. However, the interactive bar plot that highlights the home team’s performances provides information about the mechanics of goal scoring. Well-established soccer superpowers such as Brazil, Germany, and Italy are habitually dominant and suggest a strong home advantage. It’s interesting to see that home team goals rise in hosting countries, highlighting the effect of home-field advantage on performance. When combined, these graphics offer an engrossing glimpse into the changing dynamics and popularity of the FIFA World Cup throughout the world.